Revealing the Detailed Lineage of Script Outputs using Hybrid Provenance

نویسندگان

  • Qian Zhang
  • Yang Cao
  • Qiwen Wang
  • Duc Vu
  • Priyaa Thavasimani
  • Timothy McPhillips
  • Paolo Missier
  • Peter Slaughter
  • Christopher Jones
  • Matthew B. Jones
  • Bertram Ludäscher
چکیده

We illustrate how combining retrospective and prospective provenance can yield scientifically meaningful hybrid provenance representations of the computational histories of data produced during a script run. We use scripts from multiple disciplines (astrophysics, climate science, biodiversity data curation, and social network analysis), implemented in Python, R, and MATLAB, to highlight the usefulness of diverse forms of retrospective provenance when coupled with prospective provenance. Users provide prospective provenance (i.e., the conceptual workflows latent in scripts) via simple YesWorkflow annotations, embedded as script comments. Runtime observables, hidden in filenames or folder structures, recorded in log-files, or automatically captured using tools such as noWorkflow or the DataONE RunManagers can be linked to prospective provenance via relational views and queries. The YesWorkflow toolkit, example scripts, and demonstration code are available via an open source repository.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linking Prospective and Retrospective Provenance in Scripts

Scripting languages like Python, R, and MATLAB have seen significant use across a variety of scientific domains. To assist scientists in the analysis of script executions, a number of mechanisms, e.g., noWorkflow, have been recently proposed to capture the provenance of script executions. The provenance information recorded can be used, e.g., to trace the lineage of a particular result by ident...

متن کامل

Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs

Existing approaches for representing the provenance of scientific workflow runs largely ignore computation models that work over structured data, including XML. Unlike models based on transformation semantics, these computation models often employ update semantics, in which only a portion of an incoming XML stream is modified by each workflow step. Applying conventional provenance approaches to...

متن کامل

Data Provenance: A Categorization of Existing Approaches

In many application areas like e-science and data-warehousing detailed information about the origin of data is required. This kind of information is often referred to as data provenance or data lineage. The provenance of a data item includes information about the processes and source data items that lead to its creation and current representation. The diversity of data representation models and...

متن کامل

Tracking and Analyzing the Evolution of Provenance from Scripts

Script languages are powerful tools for scientists. Scientists use them to process data, invoke programs, and link program outputs/inputs. During the life cycle of scientific experiments, scientists compose scripts, execute them, and perform analysis on the results. Depending on the results, they modify their script to get more data to confirm the original hypothesis or to test a new hypothesis...

متن کامل

انیس الطالبین و عده السالکین از کیست؟

Anis al-Tālebin wa ‘Oddat al-Sālekin is a book with two scripts, a detailed and a short, about states and thoughts of Khājeh Bahā al-Din Naqshband. In the detailed script, Salāh ibn Mobārak Bukhāri has been mentioned explicitly as the author, but in sources and indices this book has been ascribed to Khājeh Muhammad Pārsā and Hesām ibn Yousof Bukhāri. The ascription of this book to Hesām ibn You...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017